Prague Czech-English Dependency Treebank: Any Hopes For A Common Annotation Scheme?

نویسندگان

  • Martin Cmejrek
  • Jan Curin
  • Jiri Havelka
چکیده

The Prague Czech-English Dependency Treebank (PCEDT) is a new syntactically annotated Czech-English parallel resource. The Penn Treebank has been translated to Czech, and its annotation automatically transformed into dependency annotation scheme. The dependency annotation of Czech is done from plain text by automatic procedures. A small subset of corresponding Czech and English sentences has been annotated by humans. We discuss some of the problems we have experienced during the automatic transformation between annotation schemes and hint at some of the difficulties to be tackled by potential guidelines for dependency annotation of English.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coreference in Prague Czech-English Dependency Treebank

We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as the coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Onton...

متن کامل

Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation

This paper introduces the Prague Czech-English Dependency Treebank (PCEDT), a new Czech-English parallel resource suitable for experiments in structural machine translation. We describe the process of building the core parts of the resources – a bilingual syntactically annotated corpus and translation dictionaries. A part of the Penn Treebank has been translated into Czech, the dependency annot...

متن کامل

(Pre-)Annotation of Topic-Focus Articulation in Prague Czech-English Dependency Treebank

The objective of the present contribution is to give a survey of the annotation of information structure in the Czech part of the Prague Czech-English Dependency Treebank. We report on this first step in the process of building a parallel annotation of information structure in this corpus, and elaborate on the automatic pre-annotation procedure for the Czech part. The results of the pre-annotat...

متن کامل

Annotation Procedure in Building the Prague Czech-English Dependency Treebank

In this paper, we present some organizational aspects of building of a large corpus with rich linguistic annotation, while Prague Czech-English Dependency Treebank (PCEDT) serves as an example. We stress the necessity to divide the annotation process into several well planed phases. We present a system of automatic checking of the correctness of the annotation and describe several ways to measu...

متن کامل

Announcing Prague Czech-English Dependency Treebank 2.0

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlyi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004